Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

نویسندگان

  • Kaushik Chakrabarti
  • Sharad Mehrotra
چکیده

Many emerging application domains require database systems to support efficient access over highly multidimensional datasets. The current state-of-the-art technique to indexing high dimensional data is to first reduce the dimensionality of the data using Principal Component Analysis and then indexing the reduceddimensionality space using a multidimensional index structure. The above technique, referred to as global dimensionality reduction (GDR), works well when the data set is globally correlated, i.e. most of the variation in the data can be captured by a few dimensions. In practice, datasets are often not globally correlated. In such cases, reducing the data dimensionality using GDR causes significant loss of distance information resulting in a large number of false positives and hence a high query cost. Even when a global correlation does not exist, there may exist subsets of data that are locally correlated. In this paper, we propose a technique called Local Dimensionality Reduction (LDR) that tries to find local correlations in the data and performs dimensionality reduction on the locally correlated clusters of data individually. We develop an index structure that exploits the correlated clusters to efficiently support point, range and k-nearest neighbor queries over high dimensional datasets. Our experiments on synthetic as well as real-life datasets show that our technique (1) reduces the dimensionality of the data with significantly lower loss in distance information compared to GDR and (2) significantly outperforms the GDR, original space indexing and linear scan techniques in terms of the query cost for both synthetic and real-life datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indexing Reduced Dimensionality Spaces Using Single DimensionalIndexesHeng

The dimensionality curse has greatly aaected the scalability of high-dimensional indexes. A well known approach to improving the indexing performance is dimensionality reduction before indexing the data in the reduced-dimensionality space. However, the reduction may cause loss of distance information when the data set is not globally correlated. To reduce loss of information and degradation of ...

متن کامل

یک روش مبتنی بر خوشه‌بندی سلسله‌مراتبی تقسیم‌کننده جهت شاخص‌گذاری اطلاعات تصویری

It is conventional to use multi-dimensional indexing structures to accelerate search operations in content-based image retrieval systems. Many efforts have been done in order to develop multi-dimensional indexing structures so far. In most practical applications of image retrieval, high-dimensional feature vectors are required, but current multi-dimensional indexing structures lose their effici...

متن کامل

MKL-Tree: A Hierarchical Data Structure for Indexing Multidimensional Data

Recently, multidimensional point indexing has generated a great deal of interest in applications where objects are usually represented through feature vectors belonging to high-dimensional spaces and are searched by similarity according to a given example. Unfortunately, although traditional data structures and access methods work well for low-dimensional spaces, they perform poorly as dimensio...

متن کامل

A Monte Carlo-Based Search Strategy for Dimensionality Reduction in Performance Tuning Parameters

Redundant and irrelevant features in high dimensional data increase the complexity in underlying mathematical models. It is necessary to conduct pre-processing steps that search for the most relevant features in order to reduce the dimensionality of the data. This study made use of a meta-heuristic search approach which uses lightweight random simulations to balance between the exploitation of ...

متن کامل

ND-Tree: Multidimensional Indexing Structure

The importance of multimedia databases has been growing over the last years in the most diverse areas of application, such as: Medicine, Geography, etc. With the growth of importance and of use, including the explosive increase of multimedia data on the Internet, comes the larger dimensions of these databases. This evolution creates the need for more efficient indexing structures in a way that ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000